Skip to content

(WIP)Feat(benchmark): Add benchmark/RAG : RAG system evaluation framework#825

Draft
sponge225 wants to merge 9 commits intovolcengine:mainfrom
sponge225:feat/rag
Draft

(WIP)Feat(benchmark): Add benchmark/RAG : RAG system evaluation framework#825
sponge225 wants to merge 9 commits intovolcengine:mainfrom
sponge225:feat/rag

Conversation

@sponge225
Copy link
Contributor

Description

RAG benchmark

RAG benchmark 是一个用于评测 Openviking 的 RAG (检索增强生成) 系统性能的框架,支持多个数据集和多种评测指标。

RAG benchmark is a framework for evaluating Openviking‘s RAG (Retrieval-Augmented Generation) system performance, supporting multiple datasets and metrics.

Features

  • 支持 Locomo、FinanceBench、Qasper、SyllabusQA 数据集 / Supports Locomo, FinanceBench, Qasper, SyllabusQA datasets
  • 完整的评测流程:数据准备 → 向量检索 → LLM 生成 → 自动评分 / Complete evaluation pipeline: data preparation → vector retrieval → LLM generation → auto-grading
  • 评测指标:Recall、F1 Score、Accuracy / Recall, F1 Score, Accuracy
  • 灵活的 YAML 配置 / Flexible YAML configuration
  • 可扩展设计 / Extensible design

详细文档请查看 benchmark/RAG/README.md。
See benchmark/RAG/README.md for detailed documentation.

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions
Copy link

Failed to generate code suggestions for PR

@sponge225 sponge225 changed the title Feat(benchmark): Add benchmark/RAG : RAG system evaluation framework (WIP)Feat(benchmark): Add benchmark/RAG : RAG system evaluation framework Mar 20, 2026
@sponge225 sponge225 marked this pull request as draft March 20, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants